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Abstract 



We consider the problem of clock synchronization in a wireless setting where processors must power- 
down their radios in order to save energy. In this setting, each processor has a radio device that is 
either on or off. When the radio device of a processor is on, it is able to communicate with other 
processors in its range. However, turning the radio on results in a significant waste of energy, even 
when listening. Energy efficiency is a central goal in wireless networks, especially if energy resources are 
severely limited. This is indeed the case in sensor networks, ad-hoc networks, and many other wireless 
network settings. Consequently, the main goal of multiple papers in wireless and sensor networks 
literature aims at achieving clock synchronization in an energy-efficient manner. In other words, the 
goal is to synchronize all clocks while minimizing the number of times a processor must switch its radio 
on. 

The problem of clock synchronization is an important problem in the field of distributed algorithms. 
In the current setting, the problem is to synchronize clocks of m processors that wake up in arbitrary 
time points, such that the maximum difference between wake up times is bounded by a positive integer 
n, where time intervals are appropriately discretized to allow communication of all processors that are 
awake in the same discrete time unit. (We remark that in this model we do not consider the issue of 
Broadcast Interference, which is a different problem known as radio broadcast problem.) The current 
model received a wide attention in sensor network literature. Currently, the best-known results for 
synchronization for single-hop networks of m processors is a randomized algorithm due to Bradonjic, 



Kohler and Ostrovsky [2] of O ( \J n/m ■ poly-log(n) ) awake times per processor and a lower bound of 



m processors with high probability, but may fail.) The main open question left in their work is to 
close the poly-log gap between the upper and the lower bound and to de-randomize their probabilistic 
construction and eliminate error probability. This is exactly what we do in this paper. 



That is, we show a deterministic algorithm with radio use of \^Jn/mj that never fails (and has 

a small hidden constant). We stress that our upper bound exactly matches the lower bound proven 
in [2], up to a small multiplicative constant. Therefore, our algorithm is optimal in terms of energy 
efficiency and completely resolves a long sequence of works in this area. In order to achieve these results 
we devise a novel adaptive technique that determines the times when devices power their radios on 
and off. Our novel technique may be of independent interest in other clock synchronization problems. 
While getting deterministic bound with extra logarithmic multiplicative term requires a new simple 
idea, getting rid of this additional term requires nontrivial combinatorial work. The contribution of 
this paper is both shwoing the simple new idea and then closing the gap to get exact upper and lower 
bounds. 

In addition, we prove several lower bounds on the energy efficiency of algorithms for multi-hop net- 
works. Specifically, we show that any algorithm for multi-hop networks must have radio use of Q(y/n) 
per processor. Our lower bounds holds even for specific kinds of networks such as networks modeled 
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of the number of awake times needed per processor [2j . (Their algorithm synchronizes the 




by unit disk graphs and highly connected graphs. Our results imply that the simple deterministic 
algorithm devised for two-processor networks in Bradonjic et al. paper [2j with efficiency 0(y/n) can 
be used in multi-hop networks, and it is the most efficient solution in terms of energy use. 

Topic classification: distributed algorithms, network algorithms. 
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1 Introduction 



Problem description and motivation: In wireless networks in general, and in sensor and ad hoc 
networks in particular, minimizing energy consumption is a central goal. It is often the case that energy 
resources are very limited for such networks. Consider, for instance, a sensor network whose processors 
are fed by solar energy. In such cases devising energy efficient algorithms becomes crucial. A significant 
energy use of a processor takes place when its radio device is on. Then, it is able to communicate with 
other processors in its transmission range whose radio devices are also turned on. However, it wastes 
significantly more energy than it would waste if its radio device was turned off. For example, in typical 
sensor networks [26] listening to messages consumes roughly as much energy as fully utilizing the CPU, 
and transmitting consumes up to 2.5 times more energy. Moreover, if a processor runs in an idle mode, 
and its radio device is off, it consumes up to 100 times less energy than it would consume if its radio 
device was on. Therefore, the time that a processor can operate using an allocated energy resource largely 
depends on how often its radio is turned on. 

Processors in a wireless network may wake up at somewhat different time points. For example, in the 
sensor network powered by solar energy, processors wake up in the morning when there is enough light 
projected on their solar cells. If the processors are spread over a broad area, then there is a difference 
in the wake up times. The processors' clocks start counting from zero upon wake up. Since there is 
a difference in wake up times, the clocks get out of synchronization. However, many network tasks 
require that all processors agree on a common time counting. In such tasks processors are required 
to communicate only in certain time points, and may be idle most of the time. If the clocks are not 
synchronized, a certain procedure has to be invoked by each processor in order to check the status of 
other processors. During this procedure processors may turn their radio on constantly, resulting in a 
major waste of energy. Therefore, clocks must be synchronized upon wakeup in order to save energy and 
to allow the execution of timely mannered tasks. The clock synchronization itself must be as efficient as 
possible in terms of energy use. It is desirable that among all possible strategies, each processor selects 
the strategy that minimizes its radio use. The energy efficiency of a processor is the number of time units 
in which its radio device is turned on. 

In this paper we devise energy efficient clock synchronization algorithms. The goal of a clock synchro- 
nization algorithm is setting the logical clocks of all processors such that all processors show the same 
value at the same time. In order to achieve this goal, each processor executes an adaptive algorithm, 
which determines the time points (with respect to its local clock) in which the processor will turn its 
radio device on, for a fixed period of time. Once a processor's radio device is on, it is able to communicate 
with other processors in its range whose radio devices are also on at the same time interval. Based on 
the received information a processor can adjust its logical clock, and determine additional time intervals 
in which its radio device will be turned on. This process is repeated until all processors are synchronized. 

Our Results: We consider single-hop networks of m processors, such that the maximum difference 
between processors wake up times is n. (Henceforth, the uncertainty parameter.) We devise several 
deterministic synchronization algorithms, the best of which has radio efficiency O(WnJm) per processor. 
Our results improve the previous state-of-the-art algorithms devised by Bradonjic et al. [2]. In [2] 
randomized algorithms for synchronization single-hop networks were devised, whose energy efficiency is 
0{^Jn/m ■ polylog{n)) per processor. Therefore, our deterministic results improve the best previous 
randomized results by a polylogarithmic factor. Moreover, Bradonjic et al. proved lower bounds of 
Q(y/n/m) per processor for the energy efficiency of any deterministic clock synchronization algorithm for 
single-hop networks. Hence, our algorithms are optimal in terms of radio use up to constant factors. In 
addition, our algorithms do not employ heavy machinery, as opposed to the algorithm of [2], that employs 
expanders and sophisticated probabilistic analysis. In contrast, we devise a combinatorial construction 
that quickly "spreads" processors' radio use approximately equally in time, which surprisingly allows 
them to synchronize more efficiently via chaining synchronization messages with each other. 
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We also prove lower bounds for multi-hop networks. We show that any deterministic synchronization 
algorithm for an m-processor multi-hop network must have total radio use Q(m ■ y/n). In [2 J a simple 
deterministic algorithm for 2-processor network was devised with energy efficiency 0{y/n) per processor. 
This algorithms is extendable to m- vertex networks, in the sense that each processor learns the differences 
between its clock and the clocks of its neighbors. The total radio efficiency of the extended algorithm 
is 0(m ■ y/n). As evident from our results it is far from optimal for single-hop networks. However, for 
multi-hop networks, its total radio efficiency 0(m ■ y/n) is the best possible up to constant factors. Our 
lower bounds hold even for very specific network types such as unit disk graph and highly connected 
graphs. 

High-level ideas: In the synchronization algorithm for m processors devised in [2] each processor 
determines by itself the time points in which it turns its radio on. The decision of a processor does 
not depend by any means on the decisions of other processors. Such a non-adaptive strategy makes the 
algorithm sub-optimal unless the number of processors is constant. Moreover, the decisions are made 
using randomization, and, consequently, the algorithm may fail. (However, the probability of failure is 
very low, since it is exponentially in n close to zero.) In contrast, our algorithms are deterministic and 
adaptive. In our algorithms, periodically, each processor deterministically decides for the time points in 
the future in which it will turn its radio on. Each decision is made based on all the information the 
processor have learnt from communicating with other processors before the time of decision. Such a 
strategy decreases the number of redundant radio uses. In other words, the radio of a processor is used 
only if this processor is essential for synchronization, and no other processor can be used instead. Since 
all processors use this strategy, the radio use of each processor is as small as possible. 

In the optimal case, a processor i, i = 1,2,.., to, wakes up at global time (i — 1) • \_(n/m)\. Each 
processor considers an (almost) exclusive time interval of length 0(n/m). In other words, it may turn 
its radio on only within the 0{n/m) first time units from wake up. The number of time units in which 
the radio is on is even smaller, specifically, 0(y/n/m). The sum of lengths of all considered intervals is 
therefore 0(n). All the considered intervals cover the entire time interval starting at the wake up of the 
earliest processor, and ending at the wake up of the latest one. Each processor has a time point in which 
it overlaps with the next processor, i.e., in which both processors turn the radio on. In the described 
case all processors are synchronized in a rely-race manner, where each processor is synchronized with the 
processor that wakes up immediately after it. However, in general, the processors wake up at arbitrarily 
global times in the range [0, n\. Therefore, there may be dense time intervals, in which many processors 
wake up, and sparse time intervals, in which few processors wake up, or even none at all. In this case 
a difficulty arises due to the need of synchronizing isolated intervals. We overcome this difficulty by 
devising a more sophisticated synchronization algorithm. 

Let V be an m-vertex set representing the processors of the network, and E an initially empty edge 
set. Each time a pair of processors u, v E V communicate with each other, add the edge (u, v) to E. Once 
the graph Q = (V, E) becomes connected, all to processors can be synchronized. Each time a processor 
turns its radio on, it communicates with other processors that also turn their radio on in the same time. 
Consequently, additional edges are added to E, and the graph Q changes. In all time points the graph Q 
consists of clusters. Initially, each vertex is a cluster, and clusters are merged as time passes. Each time 
a new cluster is formed, the clocks of the processors in the cluster are synchronized using our cluster- 
synchronization procedures. Next, each processor selects exclusive (with respect to other processors in 
the cluster) time points in the future in which its radio will be turned on. For a sufficient number of 
points, such a selection guarantees that one of the processors in the cluster will turn the radio on in 
the same time with another processor from another cluster. This results in merging of the clusters. Our 
algorithms cause all clusters to merge into a single unified cluster that contains all m vertices very quickly. 

Related work: Clock synchronization is one of the most intensively studied and fundamentally 
important fields in distributed algorithms. [DMigiTlEligQIBCHlCE^ 

The aspect of energy efficiency of clock synchronization algorithms was concerned in most of these works. 
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In |24j Polastre et al. devised an algorithm with energy efficiency 0(n) per processors. Each processor 
simply turns its radio on for n + 1 consecutive time units upon wake up. Since the maximum difference 
between wake up points is n, this guarantees that all processors are synchronized. More efficient solutions 
were devised by Palchaudhuri and Johnson |23j . and by Moscibroda, Von Rickenbach, and Wattenhofer 
[20] . In these solutions, each processor turn its radio on for 0{^/n) time units that are randomly selected. 
Their correctness is based on the birthday paradox, according to which there exists a time point that is 
selected by two processor with high probability. In this time point both processors turn their radio on 
and are able to synchronize. 

Recently Bradonjic et al. [2] devised deterministic algorithms for synchronizing two processors with 
efficiency 0{y/n). They also devised randomized algorithms for synchronizing m processors with efficiency 
0(y/n/m ■ poly log (n)) per processor. The polylogarithmic factor in the latter efficiency bound depends 
on the probability of correctness, and grows as the probability grows. In addition, [2] also prove that any 
deterministic algorithm for synchronizing m processors has energy efficiency Q(y/n/m) per processor. 

Additional synchronization problems that do not deal with energy consumption were studied in various 
threads of research. However, their description is beyond the scope of this paper. For more information 
see, for example, the surveys of Lenzen et al. |18| . Sundararaman et al. [30], Sivrikaya and Yener |28| . 

Structure of the paper: In Section 2 we describe the setting, building blocks and definitions 
used in our algorithms. Section 3 contains our synchronization algorithms. Section 4 contains the lower 
bounds. 

2 Preliminaries 

2.1 The Setting 

We use the following abstract model of a wireless network. We remark that although this abstract model 
is quite strong, it is sufficiently expressive to capture a more general case as explained in Appendix B. 
Global time is expressed as a positive integer, and available for analysis purposes only. The network is 
modeled by an undirected m vertex graph G = (V,E). The processors of the network are represented 
by vertices in V, and enumerated by 1, 2, m. For each pair of processors u and v residing in the 
communication range of each other there is an edge (u, v) G E. Communication is performed in discrete 
rounds. Specifically, time is partitioned into units of equal size, such that one time unit is sufficient for 
a transmitted message to arrive at its destination. (At the physical level this can be relaxed such that 
communication is possible if two processors turn their radio on during intervals that overlap for at least 
one time unit.) A processor wakes up in the beginning of a time unit, and its physical and logical clocks 
start counting from zero. The clocks of all processors tick with the same speed, and are incremented in 
the beginning of each new time unit. The wake up time of the processors, and, consequently, the clock 
values in a certain moment may differ. However, the maximum difference between the wake up times of 
any two processors is bounded by an integer n, which is known to all processors. (In other words, each 
processor wakes up with an integer shift in the range {0, 1, ...n} from global time 0.) See Appendix B 
for a discussion on more general cases. Specifically, the wakeup shifts may be non-integers, and the clock 
speeds may somewhat differ, as long as the ratio of different speeds is bounded by a constant. 

Each processor has a radio device that is either on or off during each time unit. If the radio device 
is off, its energy consumption is negligible. The energy efficiency of an algorithm is the number of time 
units during which the radio device of a processor is on. A pair of processors (u, v ) £ E are able to 
communicate in a certain time unit t (with respect to global time) only if the radio devices of both u and 
v are turned on during this time unit. 

2.2 Algorithm Representation 

The running time f(n, m) of an algorithm is the worst case number of time units that pass from wake 
up until the algorithm terminates. The algorithm specifies initial fixed time points for a processor to 
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turn its radio on. In addition, it adaptively determines new time points each time a processor turns its 
radio on. The time points are determined by assigning strings to processors as follows. The strings of 
the m processors are represented using a two dimensional array A. The array A contains m rows. For 
i = 1, 2, m, The ith row belongs to the ith processor. The number of columns of A is n + f(n, m). All 
cells of A are set to 0, except the cells that are explicitly set to 1. (Initially, all cells are set to 0.) The 
algorithm specifies an initial fixed string S{ for each processor i. For i = 1,2, m, suppose that processor 
i wakes up at time ti, < ti < n, with respect to global time. Then the ith row of A is initialized as 
follows. For j = 0, 1, | $ | - 1, set A[i][U + j] = Si[j]. See Figure!] (a) below. 

For j = 0,1,2..., at local time j, a processor i accesses the cell A[i][tj + j]- A processor i turns 
its radio device on at local time k > if and only if + k] = 1. If at global time t the ra- 

dio device of a processor i is on, then it can communicate with all processors j in its communication 
range for which -A [?'][£] = 1. Based on the received information, processor i deterministically decides 
whether to update cells in the row A[i]. It can update, however, only cells that represent time points 
in the future, i.e., cells A[i][t'], for t! > t. Observe, however, that processor i is unaware both of global 
time and the shift £j. (In particular, it is unaware of the index of the cell it is accessing in the row 
A[i\.) The algorithm terminates once all processors detect a column of ones, i.e., a column I such 
that for all 1 < j < m, it holds that A[j][£] = 1. (Once all processors detect a column of ones, they 
all turn their radio on in the same time, and synchronize their clocks.) A clock synchronization al- 
gorithm A is correct if for all i = 1,2, ...,m, for all shifts ti, ti G {0, 1, 2, n}, once A is executed 
by all processors there exists a column I such that for all j = 1,2, ..,m, = 1. See Figure [T] (b). 

a) A 
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Fig. 1. Example of the array A of three processors executing an algorithm with shifts ti = 2, £2 = 1> £3 = 8. 
(a) The array A is initialized with the strings Si = S2 = S3 — 110101'. (b) The array A after the 
termination of the execution. 

2.3 Building Blocks and Definitions 

A radio use policy is a protocol for a processor i G {1,2, ...,m} that determines the local time points in 
which the processor i turns its radio on. For r = 0, 1, 2, in the beginning of time unit r from wakeup, 
the processor i decides whether to turn its radio device on as explained above Q. 

For a fixed string s over the alphabet {0, 1} and a positive integer t, an (s, t)-radio-use policy of a processor 
i determines the local time units in which i turns its radio on. For a processor i that wakes up at global 

lr The decision process can also be performed using a decision tree. 
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time ti, we say that processor i performs an (s,t)-radio use policy if it sets ^4[i][ij + t + j] = s\j], for 
j = 0, 1, |s| — 1, and turns its radio device on accordingly. (Recall that processor i turns its radio 
device on at local tick k if and only if + k] = 1.) The processor starts performing the policy at 

global time U + t. It completes the policy at global time U + 1 + |s| — 1. During this period a processor 
may select new time points in the future in which additional policies will be performed. 

Next, we define the notion of length, covering-weight, and covering-density of a policy. These defini- 
tions are used in the correctness analysis of the algorithms. 

The length of an (s, i)-radio-use policy p, denoted len(p), is the difference between the positions of the 
first and last '1' in s plus one. (In other words, if j is the smallest index such that s[j] = 1, and k 
is the largest index such that s[k] = 1, then len(p) = k — j + 1.) Intuitively, the length of a policy is 
the time duration required for performing the policy. For the (s, t)-radio-use policy p, the string s is a 
concatenation of two substring s' o s" , defined by p. The substring s' is called the initial part of s, and 
the substring s" is called the main part of s. We say that i performs the initial part of p in global time t' 
if it performs the policy p, and the global time tf satisfies t + ti < t' <t + ti + \s'\ — 1. If i performs the 
policy p, and the global time t' satisfies t + U + \s'\ — 1 < t', we say that i performs the main part of p. 

We say that two processors i and j overlap if there is a global time point t' in which both processors 
turn their radio on. Two processors u and v can communicate (either directly or indirectly) if they 
overlap, or if there exist a series of processors w\,W2, ...,W}~, w\ = u,Wj~ = v such that Wi overlaps with 
Wi+i, for i = 1,2, k — 1. If such a series does not exist, we say that there is a point of discontinuity 
between u and w. A point of discontinuity is a global time point t' in which either (1) there is no processor 
that performs a radio use policy, or (2) each processor that do perform a radio use policy, completes it 
in time t' . A global time interval (s',t') is continuous if there are no points of discontinuity in it. For a 
continuous interval (s',t') such that s' and t! are discontinuity points, all processors performing a radio 
use policy during the interval (s', t') form a cluster c. In this case we say that c covers the interval (s f , t'). 
The length of a cluster c that covers an interval (s',t r ), denoted len(c), is t' — s' + 1'. 

Each processor in a cluster adds weight to the cluster. Consequently, clusters containing many pro- 
cessors are heavier than clusters containing few processors. The covering-weight of a cluster c, denoted 
cwet(c), is the sum of lengths of policies of processors contained in c. Consider two clusters c and c with 
the same covering- weight, but such that the length of c is much shorter then the length of c'. Therefore, 
d covers a much longer time interval. We show later in this paper that clusters that cover longer intervals 
are 'better' in a certain way. Consequently, d is better than c, although they have the same covering 
weight. On the other hand, a short and light cluster may be better than a long and heavy one. Therefore, 
neither the length nor the covering-weight of a cluster are expressive enough to determine how 'good' 
a cluster is. Hence, we add the notion of covering-density, which is the ratio between covering-weight 
and length of a cluster. The covering- density of a cluster c, denoted cden(c), is -j^^- Clusters of lower 
covering-density are considered as better clusters. (Observe that these definitions are different from the 
usual definitions of string weight and density in which only the number of ones in the string are counted.) 

Next, we give similar definitions for intervals. The length of an interval q = (s',t'), denoted len{q), is 
t' — s' + 1'. Suppose that during interval q there are I policies that are performed. (Possibly, some have 
started before time s', and some have ended after time t! '.) Let q±, q^, q£ be the intervals contained in 
q in which the main parts of the policies are performed. The covering-weight of an interval q, denoted 
cwet(q), is T l f =1 len(qi) . The covering-density of the interval, denoted cden{q), is . 

3 Synchronization Algorithms for Single-Hop Networks 

3.1 Procedure Synchronize 

In this section we present a deterministic synchronization algorithm for complete graphs on m vertices 
with energy efficiency 0{{sjn/m) logn) per processor. In the next section we devise an algorithm with 
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energy efficiency 0(y/n/m) per processor. This result is optimal up to constant factors, as evident from 
the matching lower bound f^yn/m) [2]. As a first step, we define the following basic radio use policy for 
a processor, consisting of two parts. Starting from local time t, For a given integer k > 0, turn the radio 
devise on for k consecutive time units. (Henceforth, initial part.) Then, for the following k 2 time units, 
turn the radio on only once in each k consecutive time units. (Henceforth, main part.) In other words, 
starting from the beginning of the main part, the radio is turned on during time units k, 2k, 3k, k 2 . This 
completes the description of the policy. It henceforth will be referred as k-basic policy. Its pseudocode 
is given below. The string s of the policy is defined by s[i] = 1, s[(i + 2) ■ k — 1] = 1 for < i < k. The 
length of a fc-basic policy is k + k 2 , but the number of time units in which the radio is used is only 2k. We 
remark that the fc-basic policy is defined for any positive integer k, but our algorithms employ policies 
in which k = ©(y^n/m). 

Algorithm 1 Procedure Basic-Policy(/c,T„) 

A fe-basic policy for a processor v starting from local time point T v 
1: for i := 0, 1, 2, k 2 + k - 1 do 
2: s[i] := 
3: end for 

4: for i := 0, 1,2,..., k - 1 do 
5: s[i] := 1 
6: s[(i + 2)-k-l]:=l 
7: end for 

8: for each local time unit T := T v , T v + 1, T v + k 2 + k — 1 do 
9: turn radio in time unit T if and only if s[T — T v ] = 1 
10: end for 



Consider a pair of processors u and v that wake up at the beginning of global time units t u and t v , 
respectively, such that t u <t v . Suppose that both processors use the /c-basic policy p upon wakeup, and 
that t v —t u < len(p). Then, there is a global time unit t in which both processors turn their radio devices 
on. In this case we say that the processors overlap. (See Figure [2] below.) We summarize this fact in the 
next lemma. 

Lemma 3.1. Suppose that processors u and v wake up at global time points t u < t v , such that t v — t u < 
len{p), and execute the k-basic policy p upon wake up, for an integer k > 0. Then u and v overlap. 

Proof. We prove that the overlap occurs during the initial part of the policy performed by v. If t v —t u < k, 
then at global time t v less than k time units have passed from the wake up times of both processors. 
Hence the overlap occurs at time t v , since both processors turn their radio on for k consecutive time units 
upon wake up. Otherwise, k < t v — t u < len(p). Since u turns its radio on in global time t u + len(p) — 1, 
there exist a global time point t > t v in which u turns its radio on. Let t! be the smallest integer such that 
t' > t v , and u turns its radio on in global time t'. Observe that according to the /c-basic policy it holds 
that t v < t' < t v + k, since during the policy execution there are no k consecutive time points in which u 
does not turn the radio on. Since the processor v turns its radio on at global times t v , t v + 1, t v + k — 1, 
the processors u and v overlap at time t! . □ 



6 



111110 1 



llo 1 



000010000100001 



111110000100001000010000100001 



Fig. 2. Two processors perform the 5-basic policy, and overlap. 

We say that processors synchronize their clocks if after the time point of synchronization the log- 
ical clocks of the processors show the same value at the same time. Any two overlapping processors 
synchronize their clocks as follows. Each processor executes the following procedure called Procedure 
Early-Synch. During its execution the processor that began performing its radio policy later among the 
two is synchronized with the other processor. In other words, the later processor updates its logical clock 
to be equal to the logical clock of the earlier processor. (Observe that the clock value of the later processor 
is not greater than that of the earlier processor, therefore, clocks do not go backwards.) To this end, each 
processor maintains the local variables id, r, J, where id is the unique identity number of the processor, 
r is the local clock value, and J is the number of time units passed since the processor began performing 
the current radio policy. The variable r is updated in each time unit by reading the logical clock value 
and assigning it to r. The variable J is set to each time the processor starts a radio use policy, and 
is incremented in each time unit. Each time a processor turns its radio on, it transmits the message 
{id,T, J). Once a processor u receives a message {id v ,T v , J v ) form a processor v, processor u determines 
whether it began its radio policy after v did. If so, u updates its local clock to t v . If both processors 
began their policy at the same time, then the clocks are synchronized to the clock of the processor with 
the greater Id. This completes the description of the Procedure. The pseudocode is given below. The 
next lemma states its correctness. 

Algorithm 2 Procedure Early-Sync() 

A protocol for vertex u that performs a policy, for any local round r. 
Initially, J u = 0. 

1: t u := local clock value of u 
2: if radio device is on then 
3: send the message (id u ,T u , J u ) 
4: end if 

5: if received a message (id v ,r v , J v ) then 

6: if (J u < J v ) or (J u = J v and id u < id v ) then 

7: set local clock to t v 

8: J u := J v 

9: end if 
10: end if 
11: J u ■= Ju + 1 

Return J u once the policy is completed. 



Lemma 3.2. Procedure Early-Synch executed by two overlapping processors synchronize their clocks. 
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Proof. Recall that t v and t u are the global time points in which u and v begin performing their radio use 
policies, respectively, and that < t v — t u < len(p). The correctness of the procedure follows from the fact 
that both processors u and v overlap. If t v —t u < k, then the overlap occurs at global time t v . Otherwise, 
observe that any time interval containing the initial part of the policy of v is of length k. It overlaps with 
u, since in the radio policy of u each sequence of zeroes is of length k — 1 followed by an occurrence of 1. 
Hence, there is a global time unit t in which the radio devices of both processors are turned on. In this 
time unit each processor receives a message from the other one. Suppose that in time t the processor 
v receives the message {Id u ,T u , J u ) from u, and the processor u receives the message (Id v ,T v , J v ) from 
v. Then exactly one processor updates its clock to the clock value of the other processor. Specifically, if 
{J u < J v ) or ( J u = J v and id u < id v ), then u updates its clock to t v , and the clock value of v remains t v . 
Otherwise, v updates its clock value to t u , and the clock value of u remains r u . □ 

Procedure Early-Synch can be generalized for synchronizing a cluster containing an arbitrary number 
of processors. Recall that all processors in the cluster perform their policies in a time interval (V, t') 
containing no discontinuity points. Hence, a message from a processor u can be delivered to all processors 
that begin performing their policy after u does so. The message is received directly by all processors that 
overlap with it, and is propagated in a rely-race manner to other processors. In this way all the processors 
in the cluster can be synchronized with the processor that was the first to start performing its policy. 

The generalized procedure is called Procedure Cluster- Synch. During its execution all processors 
u E V perform the fc-basic policy. A vertex u starts performing its policy at local time point T u that 
is passed to the procedure as an argument. (The argument is passed by another procedure that invokes 
Procedure Cluster-Synch, which is described later in this section.) Each processor u initializes a counter 
J u that is set to once the policy starts, and is incremented by 1 in each time unit. Recall that the local 
clock of u is represented by the variable r u . Each time a radio device of a processor u is on it transmits 
the message (id u , t u ,J u ). For each received message (id v ,r v , J v ) from a vertex v, if ( J u < J v ) or ( J u = J v 
and id u < id v ), then u updates its clock to r v and its counter J u to J v . This completes the description 
of Procedure Cluster-Synch. It pseudocode is provided below. Its correctness is proven in Lemma I3T31 It 
follows from the observation that all processors eventually synchronize their counters J with the counter 
of the earliest processor in the cluster. 

Algorithm 3 Procedure Cluster-Synch(T M ,/c) 
An algorithm for processor u. 

1: Perform the A:-basic policy starting from local time T u 

2: J := Early-SynchQ 

3: return J 



Lemma 3.3. For a fixed k > 0, suppose that processors v\, V2, V£ perform Procedure Cluster- Synch(T Vi , k ), 
with the parameters T Vl , T V2 , . . . , T Ve , respectively. If in the resulting execution the processors vi, «2, V£ 
form a cluster, then v±, V2, v# synchronize their clocks to the clock of the earliest processor v\. 

Proof. Let ii,£2>— De the global times in which the processors vi, V2, v% begin performing their 
policy, respectively. Assume without loss of generality that t% < t2 < ... < t(. Assume also that v\ 
is the processor with the greatest Id among the processors Vj with tj = t%. We prove by induction on 
i = 1,2, ...,£ that a processor Vi is synchronized with the earliest processor v\ once v, completes the initial 
part of its policy. 

Base case (i = 2): Observe that v\ and V2 overlap, since there are no points of discontinuity in the 
cluster. The overlap occurs during the execution of the initial part of the policy of x>2- Therefore, once 
V2 completes the initial part of its policy, it is synchronized with v\. 

Induction step: Suppose that once t> j_i completes the initial part of its policy it is synchronized 
with v\. Since there are no points of discontinuity, the processors vi-\ and Vj overlap. Let t be the last 
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global time point in which V{_\ performs the initial part of its policy. (In other words, completes 
the initial part of its policy at time t.) The last time point t' in which v j_i and Vi overlap occurs once 
i>i_i completes the initial part of its policy, or later. Hence, t < t'. By the induction hypothesis, at time 
t the processor Vi-\ is synchronized with v\. From this point and on it remains synchronized with v±. 
Hence, at time t' > t, the processor Vi receives a message with the clock value of v\ and updates its clock 
accordingly. □ 

Next, we consider the most general problem in which m processors wakeup at arbitrary global time 
points in the time interval [0, n]. If each processor performs the /c-basic policy upon wakeup, then several 
clusters may be produced. The processors in each cluster can be synchronized using procedure Cluster- 
Synch. However, the execution of procedure Cluster-Synch will not synchronize processors from distinct 
clusters since any two distinct clusters are separated by a discontinuity point. We devise a procedure, 
called Procedure Synchronize that merges these clusters gradually, until only a single cluster remains. 
To this end, the parameter k is selected to be large enough to guarantee that certain clusters have 
large covering-density. The processors in a cluster with large covering-density schedule the next policy 
execution times in a specific way that enlarges the length of the cluster to the maximal extent. Somewhat 
informally, the cluster is extended roughly equally to both of its sides. In other words, there is an integer 
L > such that in the next phase the cluster begins L time units earlier than in the previous phase, and 
terminates L units later than in the previous phase. For a precise definition see Algorithm 4, line 14. 
The extension of the cluster to both of its sides prevents time drifts, and, consequently, in each phase 
some clusters overlap. Overlapping clusters are merged into fewer clusters of greater covering- weight. 

Algorithm 4 Procedure Flatten(J u , k) 

A protocol for a vertex u, executed once u completes the initial part of its pol- 
icy 

1: /*** First stage ***/ 

2: J . — J u 

3: wait for 2n — J time units 
4: /*** Second stage ***/ 
5: B:={(Id u ,J)} 
6: transmit (Id u ,J) 

7: for each received message m = {Id v ,«/') do 
B := BU{m} 
end for 

B' := sort B by Ids in ascending order 
len(c) := (max{ J'\(Id, J') e B'} ) 
ix := the position of (Id u , J) in B' 
i := \B\ 



10 
11 
12 
13 
14 
15 



j n i i len(c)—£-k 2 , , o 
next := 2n + t u -\ V [i- k 

return next /* returned locally to the caller of this procedure */ 



The procedure for extending the length of a cluster is called Procedure Flatten. It is executed by 
processors in a synchronized cluster c, and proceeds in two stages. The first stage (See Algorithm 4, 
lines 1-3) is executed by each processor u in the cluster once the processor u is synchronized with the 
first processor of the cluster v±. (In other words, once u completes the initial part of its /c-basic policy.) 
Then the counters J of u and v\ are also synchronized. A processor u schedules the next execution of 
its policy to be executed in 2n — J time units. The second stage (Algorithm 4, lines 4-15) is executed 
once u performs the policy the next time. Observe that it is executed in the same time by all processors 
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in the cluster. All processors of the cluster turn their radio on and learn the number of processors in 
the cluster, their ids, and the length of the cluster len(c) with respect to the first stage. (We describe 
how to determine len{c) shortly.) Each processor sorts the ids and finds its position /x in the sorting. If 
the current local time is r, and the number of processors in the cluster is £, it schedules the next policy 



execution to local time 
The length of the c 



2n + t + 



len{c)~t-k 2 



+ ^-k 2 



, and returns this value. 



uster len(c) is equal to the difference between the global time points of the 
beginning and the end of the cluster. Therefore, the length len(c) is determined by the latest processor in 
c. Once the latest processor V£ completes its policy in the first stage, its counter Jg (which is synchronized 
with the counter of the earliest processor) is equal to the number of time units passed since the cluster 
has started. Once vi completes its policy, the entire cluster c is completed. Hence, at that moment, it 
holds that len(c) = Ji. All processors learn this value in the second stage. (See step 11 in the pseudocode 
of Algorithm 4.) This completes the description of the procedure. Its properties are summarized below. 
See Figure [3] for an illustration. 

Lemma 3.4. Suppose that Procedure Flatten is executed by a cluster c of £ processors that is formed in 
a global time interval [p, q] . Then 

(1) The second stage of Procedure Flatten is performed at global time p + 2n by all processors of c. 

(2) Performing the policies by the scheduling of the second stage forms a cluster d of length £ • k 2 . 

(3) The cluster d covers an interval that contains the interval [An + — — sr-,4n + _|_ ^|-]. 



V4 
V3 
V2 
Vl 



The cluster c 

_A_ 



r 



r 



The cluster c' 

A 



Time: 



to 



to + In to + 4n + (kmc) - fk 2 ) 12 



Fig. 3. Illustration of Procedure Flatten with four processors. (£ = A.) 

In order to synchronize m processors that wake up at arbitrary times from the interval [0, n], set 



k 



\/8 • n/m . Procedure Synchronize is performed in phases as follows. For i = 1,2..., the ith phase 
starts in global time (i— 1) - An. In each phase, two iterations are performed. Initially, in the first iteration 
of the first phase, each processor performs the /c-basic radio policy upon wake up. Consequently, clusters 
are formed in the interval [0,2n]. Each cluster is synchronized using Procedure Cluster-Synch. In the 
second iteration of the first phase, Procedure Flatten is performed. Then the next phase starts. In the 
first iteration of each phase, the fe-basic policy is performed by each processor starting from a time point 
that was scheduled for it in the previous phase by Procedure Flatten. Consequently, new clusters are 
formed and synchronized. In the second iteration, Procedure Flatten is performed, and schedulings for 



Actually, at time to, it is sufficient for each processor to turn the radio on for a single time unit. The figure reflects the 
description of the algorithm in which at time to the entire fc-basic policy is performed. (For analysis purposes.) 



10 



the next phase are determined. Procedure Synchronize terminates once the interval [i ■ An, i ■ An + 2n] 
is continuous, for an integer i > 0. A continuous cluster of length at least 2n necessarily contains all m 
processors. Finally, Procedure Cluster-Synch is executed causing all m processors to synchronize. This 
completes the description of the Procedure. The pseudo-code of the procedure is given below. 



Algorithm 5 Procedure SynchronizeQ 



An algorithm for a processor v 

1: k = v/8 • n/m 

2: r = 

3: for i = 1,2,..., [logn] do 

4: J := Cluster-Synch(r, k) 

5: r := Flatten (J, k) 

6: end for 

7: Cluster-SynchfY, k) 



Procedure Synchronize preserves cluster distances in each phase in the following sense. Suppose that 
two processors u and v wake up at global times t u and t v respectively. Then, for i = 1,2, ...,logn, there 
are clusters Cj and such that Cj covers an interval containing the point (t u + An ■ i), and d i covers an 
interval containing the point (t v +An-i). Moreover, the cluster a contains the processor u, and the cluster 
c[ contains the processor v. This observation, which is a consequence of Lemma 13. A\ is summarized below. 

Corollary 3.5. Suppose that a processor v performs the k-basic policy in time t S [0,2n]. If a cluster c 
covers an interval containing the time point (t + An ■ i) for some integer i > 0, then c contains v. 

In each phase of Procedure Synchronize, after the execution of Procedure Flatten, the sum of lengths 
of produced clusters is at least k 2 • m > 2n. Consequently, at least two clusters overlap in each phase, 
and the number of clusters is decreased in each phase. Hence, it is obvious that m phases are sufficient to 
merge all clusters into a single cluster. However, the merging process is actually much faster. The next 
Lemma states that after logn phases there is a single cluster containing all m processors. 

Lemma 3.6. Once Procedure Synchronize is executed the global time interval [[logn] - An, [logn] -An+2n] 
is continuous. 

Proof. Suppose without loss of generality that the length of the /c-basic policy k + k 2 satisfies k + k 2 < n. 
(Otherwise all processors overlap with the first awaking processor, and the problem becomes trivial.) In 
the execution of Procedure Synchronize all processors perform the /c-basic policy completely during the 
interval po = [ten s o] = [0, 2n]. Hence, the covering- weight of the interval po is at least k 2 • m > 8n. 
The covering-density of the interval is at least 4. We define a series of intervals p\ = [ti,si],p2 = 
[t2,s 2 ], ...,px = [t\,sx] as follows. For i = 0,2, A - 1, if cden([ti, \\{U + Sj)]]) > cden([\\(ti + s;)] ,Sj]) 
then pi + \ = [U, \\{ti + Sij\\. Otherwise pt+\ = [\^{U + Si)~\ ,si\. Observe that pi+i is contained in p{. 
For % = 1,2, A, the covering-density of pi is at least 4. See Figure H] below for an illustration. Next, 
we define another interval series p' ,p'i, ■■■■>p'\ as follows. p' = p\, and for i = 1,2, ..,A, p\ = [t'^s 1 ^ = 
[t X -i + i ■ An,s X -i + i ■ An]. 

Set A = [logn] . We prove by induction on i that p\ is continuous, for % = 1, 2, A. 
Base (i = 1): Observe that the length of p' Q is len(p' Q ) = len(p\) < 2. (Since len(pi) < n/2' 1 + 1.) Also 
cden(px) > 4. Hence, by Lemma l3~4l (3), once the clusters of the first phase are flattened the interval 
[4n + tx — len(px),An + sx + len(px)] is continuous. Hence, the interval p[ is continuous, 
induction step: By induction hypothesis, assume that p\_x = [^A-i+i + {i — 1) ' 4n, sx-i+i + (i — 1) ■ 4n] 
is continuous. Thus, there is a cluster c that covers an interval containing p\_\. By Corollary 13.51 the 
processors of clusters covering intervals that intersect with px-i+i are contained in c. Suppose that there 
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are £ processors in c. Since the covering-density of is at least 4, it holds that £-k 2 > <i-len(px-i+i) 

1 • Zen(p^_ 1 ). Hence, ^ • k 2 > 4 • (s\-i+i — £a-i+i) = 4 • (s' i _ 1 — By Lemma [3741 ( 3). once the clusters 

of the phase i are flattened the interval [i ■ 4n + *A-i+i — ^ n {p\-i+\)^ • 4n + sa-«+i + Zen(px-i+i)] is 
continuous. Hence, the interval = [t^_j + i • 4re, s\-i + i • 4n] is continuous. □ 



Time: 



!«74 



3/8//' i«72 



n 



p 3 



P2 



Pi 



Fig. 4. Illustration of the intervals pj. 7i ZioWs i/iai n' = 2n. 

By Corollary 13.51 and Lemma I3.6[ all m processors are synchronized during global time interval 
[flogn] • 4n, [logrt] • 4n + 2n]. Each processor performs the /c-basic policy a constant number of times in 
each phase. Hence, in each phase, the number of time units in which each processor turns its radio on is 
0(k) = 0{\f n/m). The properties of Procedure Synchronize are summarized in the following Theorem. 

Theorem 3.7. Procedure Synchronize performs clock synchronization of m processors waking up at 
arbitrary time points from the interval [0,n]. The energy efficiency of each processor is 0(\Jn/m ■ logn). 

3.2 Procedure Dynamic-Synch 

In this section we show that by using a more sophisticated procedures one can achieve energy efficiency of 
0(y/n/m) per processor. We start with describing a gas-stations riddle whose solution gives an intuition 
to the main ideas of the procedures we devise in this section, m gas stations are arbitrarily placed on a 
one-way circular road. The total amount of gas in all stations is sufficient for a car to complete exactly 
two laps on the road. The car's gas-tank is sufficiently large, hence, each time the car approaches a 
station, it can add all the gas of the station to its tank. Can a car with an initially empty gas-tank start 
from one of the stations, and complete an entire lap? The answer to this riddle is affirmative. There 
always exists such a station. To find the station, select an arbitrarily station p on the road. Place the 
car at the earliest station s ^ p before p (with respect to the driving direction) such that the car is able 
to arrive from s to p. In other words, if the car is placed at a station that appears before s it gets out of 
gas before arriving to s. (If there is no such station, then a car placed at station p can complete an entire 
lap, and we are done.) The car drives form s to p. When it arrives to p it has enough gas to complete an 
entire lap, since the gas stations in the interval from p to s have not enough fuel to complete this interval. 
Consequently, the fuel of the stations in the interval from s to p is sufficient for completing this interval 
plus another complete lap. 
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In our algorithms the stations represent processors. Using the fuel of a certain station represents 
turning the radio on by the appropriate processor. However, using x units of fuel represent a radio use of 
0{y/x). For i = 1, 2, m, the goal of a processor i is to execute its radio use policy in the time interval 
in which the car would use the gas of station i. Since in each time unit the car uses the gas from only one 
station, there are no time units in which more than one main part of a policy is executed. However, for 
a processor to be able to determine the appropriate intervals a more sophisticated flattening procedure 
has to be used. 

We devise a procedure called Dynamic Flattening. The use of dynamic flattening allows completing 
the synchronization in two phases instead of O(logn) phases that are required by Procedure Synchronize 
that was devised in the previous section. The main difference of Procedure Dynamic Flattening comparing 
to Procedure Flatten is that the scheduling stage is performed during the first execution of the policy 
rather than in the end of a phase. This scheduling is performed only once, shortly after a processor 
wakes up. A processor schedules the next execution of its policy to the first available free interval, i.e., 
an interval in which no other processor is scheduled. To this end, a queue of processors is maintained by 
each cluster. Consequently, the next policy execution of each processor v is scheduled in such a way that 
the main part of v 's policy does not overlap with any of the other m — 1 processors when they execute 
the main parts of their policies after scheduling. (In contrast, in Procedure Flatten the new scheduling 
of phase i guarantees only that the main part of the policy of v does not overlap with any processor 
in the cluster containing v in phase i.) At time 2n at least \m processors are scheduled one after the 
other to perform their policy. As a result, the global interval [2n, An] is continuous. To guarantee that all 
m processor perform their policy during this interval, each processor perform an additional independent 
invocation of its policy at time In from wake up. 

The algorithm that employs this idea is called Procedure Dynamic- Synch. 
Informal description of Procedure Dynamic- Syncn (for each processor v G V) 

step 1. The vertex v sets k := a/8 • n/m , and performs the initial part of the /c-basic policy, 
step 2. If during step 1 one of the following holds: (i) v does not discover any other processor whose 
radio is turned on, or (ii) all discovered processors have waken up after v did, or have waken up at the 
same time as v but have smaller Ids than that of v , 

then v initializes a cluster c and an empty queue q c . The processor v enqueues itself on q c and starts the 
main part of its policy once the initial part is complete. 

step 3 (Dynamic Flattening) Otherwise, a queue q is already initialized and maintained by the 
processor u currently executing the main part of its policy. (The queue q was created by the earliest 
processor in the cluster and passed in a rely-race manner. We stress that u is not necessarily the earliest 
processor in the cluster.) Then v enqueues itself on q by communicating with u, and receives the number 
£ of processors that appear in q before v. Suppose that u has performed the main part of its policy for 
r rounds once communicating with v. Then v schedules the next fc-basic policy execution such that the 
main part of its policy is executed in (£ — 1) • k 2 — r time units. Such scheduling guarantees that policies 
of processors in q are executed one after the other immediately, in the order they appear in q. 
step 4. Once a processor completes executing its main part it dequeues itself from q and passes q to the 
next scheduled processor (with which it necessarily overlaps). 

step 5. Execute the /c-basic policy at time 2n + 1 from wake up. (Independently of steps 1-4.) 

This completes the description of the procedure. Its formal description and pseudocode are given in 

Appendix A. Its properties are summarized below. (See Figure [5] for an illustration.) 

Lemma 3.8. Suppose that m processors wakeup during the global time interval [0,n], and execute proce- 
dure Dynamic- Synch. Then the following hold: (1) for any pair of processors u and v, their main parts 
are executed in distinct time intervals (that have no common time points) in the global interval [0,2n], 

(2) each cluster c that covers an interval in [0, 2n] satisfies that cden{c) < 1, 

(3) there exists a cluster d that covers an interval containing the global time point In. At global time In 
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the queue of d contains at least m/2 processors. 



Proof. (1) Let c be the dynamic cluster formed by Procedure Dynamic-Synch that contains u. Let t c be 
the global time of wake up of the earliest processor w in c. (The time t c is also the start point of the 
original cluster of w.) Suppose that a processor u has l u processors in c that wake up before u, or wake 
up at the same time as u but have greater Ids. Procedure Dynamic-Flattening schedules the main part 
of u to be executed in the interval = (t c + t u ■ k 2 , t c + (£ u + 1) • k 2 ]. If a processor v / u belongs to c 
then £ v ^ t u . The interval in which the main part of v is executed is I v = {t c + £ v ■ k 2 , t c + (£ v + 1) • k 2 ]. 
Hence, it holds that I u D I v = 0. If v does not belong to c, then by definition I u and I v do not have 
common time points. 

(2) Consider a cluster c that covers an interval p = [s,t] in [0, 2n\. By (1) the main parts of policies in 
c occur in distinct time intervals. Consequently, the sum of lengths of the main parts is at most len{p). 
Hence cden(c) < 1. 

(3) At global time point 2n all processors have already waken up and entered queues of leaders. By (2) 
at most m/2 processors performed the main part in the interval [0, 2n]. (Because k 2 ■ m/2 > 2n. Thus, if 
more than m/2 processors perform the main parts in the interval [0,2n], at least one cluster must have 
density greater than 1. A contradiction.) Therefore there exist a cluster d covering an interval containing 
the point 2n. At time point 2n the queue of the current leader in d contains all processors that have not 
executed the main part before time 2n. Hence, it contains at least m/2 processors. □ 



On 

wake up 

After- 
scheduling 



Time 



Fig. 5. Illustration of Procedure Dynamic-Sync at time point t. Processors 1,2,3 have already performed 
the scheduled main part of their policy. Processors 4,5 are already scheduled, but have not performed their 
scheduled main part yet. Processor 6 will be scheduled once the main part of processor 5 is performed. 

By Lemma [3.8l (3). at global time point 2n at least m/2 processors are scheduled consequently. Hence 
the global interval [2n, An] is continuous. All m processors execute their policy during this interval. 
Therefore all m processors synchronize their clocks. Each processor execute the /c-basic policy (fully or 
partly) 3 times. The correctness of procedure Dynamic-synch is summarized in the next Theorem. 

Theorem 3.9. Procedure Dynamic-synch synchronizes the clock of m processors that wake up in the 
interval [0, n]. The energy efficiency per processor is 0{yjn/m). 
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4 Lower Bounds 



In this section we show strong lower bounds for energy use in clock synchronization in general graphs. 
We consider two scenarios. In the first scenario the energy efficiency of an algorithm is the maximum 
energy efficiency of a processor. This is the scenario discussed in previous sections. In the second scenario 
the energy efficiency of an algorithm is the average of energy consumed by the processors in the worst 
case. Observe that the second scenario is weaker in the sense that an algorithm with energy efficiency 
0(k) in the second scenario may have energy efficiency uo{k) in the first scenario. The goal of an efficient 
algorithm in the first scenario is minimizing the maximum radio use of a processor. On the other hand, 
the goal in the second scenario is minimizing the sum of energy used by all processors. We prove our 
lower bounds for both scenarios. Moreover, our lower bounds apply not only to general graphs but also 
to specific families of graphs that are used to model wireless networks, such as unit disk graphs. We 
require the following results from [2]. 

Lemma 4.1. [2] Suppose that each processor vi,V2, in the complete graph of to processors turns 
its radio on for o{^Jn/m) time units. Then for any deterministic synchronization algorithm A there are 
global time points ti,i2, ■■■,t m G [0,n] of wake up and execution of A by v±,V2, v m , respectively, such 
that no two processors overlap. 

Lemma 4.2. Jj|/ In a two-processor network, for any radio use policy used by two processors u and v, if 
u and v turn their radio on for o{y/n) times each, there exist waking up global times t u ,t v G [0, n] of u 
and v respectively, such that u and v do not overlap. 

We start with considering the first scenario in which the energy efficiency of the algorithm is the 
maximum energy efficiency of a processor. Lemma 14.21 implies that a synchronization of a two-processor 
network has energy efficiency Q(y/n). Consequently, a synchronization of any to- vertex network that 
contains an isolated vertex w (a vertex with degree 1) has radio efficiency Q(->/n) per processor. Otherwise, 
if all processors have radio efficiency o{- s /n), then there are global time points t w , t' w G [0, n] such that 
w wakes up at time t w , the neighbor of w wakes up at time t' w , and w does not synchronize with its 
neighbor. Hence, if the goal is minimizing the maximum radio use per processor then any algorithm for 
general graphs has efficiency Q(y/n) per processor. 

Next, we consider the second scenario in which the energy efficiency of the algorithm is the average 
of energy consumed by the processors. Surprisingly, we get the same result even for this weaker scenario. 
Let G' = (V',E') and G" = (V",E") be complete graphs of m! = m" = to/2 vertices each. Suppose for 
contradiction that there exist a synchronization algorithm A for general graph of to processors in which 
the sum of radio use of all processors is o(my/n). Then in any invocation of A on a graph G = (V, E), 
there is a processor v G V whose radio use is o{^Jn). Suppose that all processors of G' wake up at global 
time t', and all processors of G" wake up at global time t". Let X' denote an execution of A on G' , and 
X" the execution of A on G". There is a vertex v' G V (respectively v" G V") whose radio use in the 
execution X' (resp., X") is o(y/n). 
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Fig. 6. The graph G is obtained by connecting the vertex v' of G' and the vertex v" of G" . 

Consider the graph G = (V U V", E' U E" U (v f , v")) that is achieved from G' and G" by connecting 
the vertices v' and v" . (See Figure^) For i, j G {0, 1, ..,n} Let X(i,j) be the execution of A on G where 
all processors of V' wake up at time i, and all processors of V" wake up at time j. Since A synchronizes 
all processors of G, the processors v' and v" overlap in each execution X(i,j). Consider a two-vertex 
network H consisting of a connected pair of vertices u,w. The vertex u simulates the graph G' , and w 
simulates G" . The vertex u (respectively, w) turns its radio on if and only if v' (resp., v") turns its radio 
on. Once an algorithm A is invoked by u (respectively, w) it simulates locally the execution of A on 
G' (respectively, G"). For any execution X(i,j) on H, the processors u and w overlap, since v' and v" 
overlap in the execution of A on G. In each execution, each of the processors u and w has a radio use of 
o(y / n). This is a contradiction to Lemma 14.21 Hence, at least m/2 vertices in G (either all vertices of G' 
or all vertices of G") must have radio efficiency $l(y/n). We summarize this discussion in the following 
theorem. 

Theorem 4.3. In any clock synchronization algorithm A for general graphs the sum of processors radio 
use is {l(m ■ \/n). The energy efficiency of A in both scenarios is f^-y/ra). 

Observe that the construction described above applies also to unit disk graphs, i.e, graphs in which all 
vertices are placed in the plane, and have the same transmission range. Specifically, let r be the radius 
of transmission in a unit disk graph. Place the vertices of V on the border of a cycle c' of radius 1/2 • r. 
Similarly, place the vertices of V" on the border of a cycle c" of radius 1/2 • r. Place v and v' in distance 
r one from the other, such that all other vertices u' G V' , u" G V" are in distance greater than r one 
from the other. The Lower bound in Theorem 14.31 applies for this construction as well. 

Lemma 4.4. In any clock synchronization algorithm A for unit disk graphs the sum of processors radio 
use is r2(m • y/n). The energy efficiency of A in both scenarios is £l(^/n). 

Next, we devise a lower bound for yet narrower family of graphs. An (.-connected graph is a graph 
in which there are at least I edge-disjoint paths connecting any pair of vertices. Consider an m = 2m' 
vertex graph G consisting of two complete graphs G' = (V = {v'^, v' 2 , v' m , 2 }, E') and G" = (V" = 
{v'{, v'2, ...,?)^/ 2 }, E"). Let I be a positive integer parameter such that i < m/4 — 2. For i = 1,2, ...,£ + 2, 
the vertices v\ and v'( are connected. For i, j > 1 + 2, the vertices v[ and v'j are not connected. It is easy to 
see that G is an ^-connected graph. (See Figure[7|below.) Suppose that each vertex v[, v' 2 , v'^v'l, v' 2 ' , ...v" 
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turns its radio on for o(y / n/£) time units. Then, by Lemma 14.11 for any synchronization algorithm, 
there are time points such that no two processors among v[,v' 2 , ■■■,v' £ ,v'{,V2, ...v" overlap. Since the 
endpoints of each edge that connects G' and G" belong to {v^v^, ...,v'^v'{,V2, ---v"}, the network is not 
synchronized. Thus, if there are at least £ vertices in G' that have radio use o(y/n/£) each, and at least £ 
vertices in G" that have radio use o(y / n/£) each, the network is not synchronized. Consequently, at least 
m — 21 + 1 = Q(m) vertices must use the radio for Q(y/n/£) time units each in order to synchronize the 
network. This result is stated in the following theorem. 




Fig. 7. The graph G is obtained by connecting the vertex v[ of G' and the vertex v" of G" , for i = 
1,2,. ..,£ + 2. 

Theorem 4.5. For a positive integer parameter £ < m/4 — 2, in any clock synchronization algorithm A 
for £-connected graphs the sum of processors radio use is £l(m ■ ^Jnj€). The energy efficiency of A in 
both scenarios is VL{\fn/£). 

5 Conclusion 

In this paper we have devised optimal radio-use deterministic algorithms for clock synchronization in 
single-hop networks with energy efficiency Q(^n/m). We also proved lower bounds of Q(^/n) for multi- 
hop networks. Our results suggest that in order to beat this bound of Q,(^/n), each neighborhood in 
the graph must be highly connected, containing no isolated regions. For wireless networks, this requires 
a certain level of uniformity in the processors distribution. In other words, for each processor u, each 
neighbor of u must be in the communication range of a significant number of other neighbors of u. 

In [2] a deterministic synchronization algorithm was devised for two-processor networks with efficiency 
0(y/n). This algorithm can be used also in multi-hop network in order to synchronize each processor 
with its neighbors. The energy efficiency in this case is 0{^/n) per processors. Somewhat surprisingly, 
our lower bounds imply that this simple approach is optimal in general multi-hop networks. 
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Appendix 



A Procedure Dynamic- Synch 

The pseudocode of Procedure Dynamic-Synch is given below. Each vertex maintains the local variables 
candidate, winner, and q. During the execution of the main part by a processor v, the processor v is 
called a temporary leader. The goal of the procedure is to guarantee that there is at most one leader at 
any time point. However, during the execution of the algorithm there may be numerous leaders, since 
different processors may become temporary leaders at distinct time intervals. To this end, each vertex 
u initially sets its local variable candidate to true, i.e., it is a candidate for leadership. Then it sends 
an initial message for k rounds. If u receives a response from a leader, u cannot become a leader in this 
phase. Hence, u sets the variable candidate as false. If, on the other hand, u does not receive a response 
from a leader during the initial phase (the first k rounds), it can become a leader. However, additional 
processor may try doing so concurrently. In order to select exactly one leader at a time, a local variable 
winner is maintained by each processor. Similarly to leadership, winning is a temporary state. In other 
words, at any time point there is at most one winner, but in different time points there may be distinct 
winners. A processor u is set as temporary winner only if in the first round performed by u there are no 
other winners, or if all other potential winners have smaller Ids. (Consequently, these potential winners 
lose.) 

A temporary winner candidate becomes a temporary leader once its initial part is complete. It sends a 
response for each initial message it receives from other processors. The response contains the information 
required for the other processor to schedule a time interval in which it can become a temporary leader and 
execute its main part exclusively. To this end, a temporary leader u maintains a queue q, that initially 
contains only Id{u). The queue q represents all processors that are already scheduled to perform their 
main parts, but have not completed the main parts yet. Each received message with an Id of another 
processor v is enqueued on q. A response is sent with the position of Id{v) in q. Consequently, any two 
distinct processors receive from u a distinct position, and schedule the executions of their main parts to 
distinct intervals. Moreover, once u completes its main part it pops its Id from q, and passes q to the 
next temporary leader that is scheduled right after u. Consequently, any two processors schedule distinct 
time intervals for their main parts, even if they communicate with different leaders. 

The exact computation of the time interval to perform the main part is performed by procedure 
Dynamic-Flattening as described above. Its pseudocode follows. The procedure accepts as input the 
variables k, candidate, winner, q, £, dif. The variable £ is the position of the processor in the queue of 
the temporary leader. The variable dif is the difference between the number of rounds the main part of 
the leader has executed, and the number of rounds that current processor has executed. Based on this 
information the processor schedules the time of execution of its main part, and becomes a temporary 
leader during this period. 
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Algorithm 6 Procedure Dynamic-SynchQ 



An algorithm for a processor v. The rounds are counted from 

wakeup. 

k 



1/8 • n/m ; candidate := true ; winner := true ; q := {Id(v)} 
/*** initial part ***/ 
for rounds r := 1, 2, A; do 

transmit the message initial(/(i(t;), r) 

for each received message initial(/d(u), r u ) do 

/* local processing of messages is by ascending order of Ids */ 
if (r = 1) and (r u > r or (r u = r and Id(u) > Id(v))) then 

winner := false 
end if 

if is not in q then 

c/.enqueue(7(i(u)) /* q[\q\ + 1] := /<i(-u) */ 
end if 
end for 

if candidate and received the message initial-response (/(f(t> ), £, f) then 
candidate := false 

Dynamic-Flattening(/c, candidate, winner, q, I , r — r) 
end if 

if r = k and candidate and winner then 
for j := 1,2, |<?| do 

transmit the message initial-response^^'],^, 0) 
end for 

Dynamic-Flattening(/c, candidate, winner, q, , 0) 
end if 
end for 

execute the fc-basic policy independently starting from round 2n + 1 
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Algorithm 7 Procedure Dynamic-Flattening(fc, candidate, winner, q, t, dif) 

An algorithm for a processor v. The rounds are counted from 

wakeup. 

1: if candidate and winner then 
2: next := k 
3: else 

4: next := (£ - 1) • k 2 - dif 
5: end if 

6: /*** main part ***/ 

7: for rounds r := next + A;, next + 2k, next + k 2 do 

8: if (r = next + k) and not {candidate and winner) then 

9: receive the message pass((/) 
10: q := q' 
11: end if 

12: for each received message initial (Id(u), r u ) do 

13: /* local processing of messages is by ascending order of Ids */ 

14: if Id(u) is not in q then 

15: q.enqueue(Id(u)) 

16: end if 

17: end for 

18: for j := 1, 2, |g| do 

19: transmit the message initial-response^^'], j, r — next) 
20: end for 
21: end for 

22: on round next + /c 2 + k perform q.dequeueQ and transmit the message pass(g') 
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B General Scenarios 



We begin with analyzing the scenario in which the shifts of wake up are not necessarily integer. In 
this scenario each processor v wakes up at global time t v 6 [0, n) C 7£, and all clocks proceed with the 
same speed. In this scenario it is impossible to achieve precise synchronization, since it is impossible to 
adjust a clock by an arbitrary small fraction of a unit. Therefore, the goal in this case is to set all clocks 
t%, Tii r m such that at any time point the difference between Tj and Tj is at most 1, for 1 < i ^ j < m. 

Next, we describe an additional step that each processor must perform each time it updates its clock. 
Otherwise, there might be a difference between clocks that may grow significantly as multiple updates 
are performed. The additional step guarantees that the difference is always smaller than one time unit. 
Let t be the maximum time required for a sent message to arrive at its destination. A time unit is set to 
2t. Consequently, if two processors turn their radio on, and overlap for at least half a unit, then they are 
able to communicate. A processor u maintains an additional variable q u that holds a value from the range 
[— 2, 5] that is initially set to 0. Each time a processor v communicates with a processor u it determines 
the length q of the overlap, which is a fraction in the range [A, 1]. Next, it sets q* := 1 — q if u turned its 
radio on after v did. Otherwise it sets q' := q — 1. Observe that q' is a fraction in the range [— |, 5]. It 
represents the time length between a clock tick of u, and a clock tick of v. Specifically, if v increments 
its clock at global time i, then u increments its clock at global time t + q' . 

Suppose that a processor u communicates with a processor v, and updates its clock r u to the clock 
t v of v. Then it should also update the variable q u as follows. q u := q v + q' . Consequently, the value of 
q u may rise beyond ^, or fall below — 4. In such cases, if q u > \ set t u := t u + 1, and q u := q u — 1. If 
Qu < — \ set t u := t u — 1, and q u := q u + 1. This completes the description of the additional step. We 
show that processors that perform synchronization using this step are indeed synchronized. 

Lemma B.l. Suppose that processors vi,V2, ■■■,V£ perform synchronization with the step described above. 
Then, at any time point after synchronization the difference between any two clock is at most 1. 

Proof. Suppose without loss of generality that v\ is the earliest processor. We prove by induction on the 
number of processors that the difference between the clocks of v\ and Vj is at most ^, for j = 1, 2, ...,£. The 
base case is trivial. For the induction step, Suppose that j — 1 processors v\, V2, ...,Vj-i are synchronized 
with the earliest processor v±. Observe that q Vl = since v\ does not update its clock. Also, for 
i = 1,2, j — 1 it holds that \r Vi — t Vi \ < 1, and r Vi + q Vi = t Vi where < q Vi < \- Suppose also that 
an additional processor u = Vj performs synchronization with a processor v^, for any 1 < k < j — 1. The 
processor u sets t u := r„ fe , and q u := q Vk + q' . At this point it is possible that the clocks of v\ and u have 
a difference larger than |, but is not greater than 1. However, once the step is completed, q u is updated 
in such ci way that the difference becomes cit most Jj-, and it holds that t u -\- q u — t Vi . □ 

Finally, we remark that the model is sufficiently expressive to capture an even more general case in 
which the clock speeds differ, as long as the ratio of different speeds is bounded by a constant. For full 
analysis see [2], Section 7. 
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